perm filename WIZARD.TXT[SS,SYS]12 blob
sn#744860 filedate 1984-03-14 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00021 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00004 00002 This file documents various WAITS features and facts that system
C00006 00003 Patching the system is done with FRAID and UEDDT and SYSFIX. To patch the
C00010 00004 The WHO program is capable of displaying one or more blocks of data from
C00017 00005 The PK program is capable of displaying the text in the system disaster
C00019 00006 Flags set automatically upon Reload: PDP-10 SW register.
C00025 00007 KLDCP: compiling it, putting it on tape, 11LOAD to reload the 11
C00028 00008 The correct way to start DDT on the console 11 is the following:
C00029 00009 How to put up new DSKDMP[SS,SYS] and new BOOT on console-11 dectape:
C00033 00010 HOWTO unwedge the MagTape service:
C00034 00011 The program DEV[SS,SYS] types out the names of all devices in the DDB chain,
C00035 00012 The program CORMON[SS,SYS] types out current core usage parameters.
C00036 00013 The program UP2DDT[SS,SYS] runs P2 EDDT from a P1 terminal, with various
C00037 00014 The program P2LOAD[1,2] can mark P2 memory up or down, reload P2 from any
C00041 00015
C00043 00016 A Tale of Two CTYs
C00047 00017 WSYNC
C00048 00018 How To Find a bad RAM chip in the Ampex DC830 disk controller.
C00060 00019 ∂07-Nov-82 1831 ME new DART up, w/RETRY <number>
C00063 00020 SYSSAV: To save a copy of the running system in a file, do the following:
C00065 00021 ∂09-Sep-83 1027 JJW@S1-A Remind queue
C00073 ENDMK
C⊗;
This file documents various WAITS features and facts that system
programmers may need or want to know, but which are of little or no use to
the general user community. Some special programs for system programmers
are documented here.
Some pieces of the WAITS operating system have their internal workings
documented in the files *.DOC[SS,SYS].
WIZARD.TXT is intended to include all the common techniques of managing
the system. This file provides a place to record anything that may be of
use to future (and present) system programmers, whenever such information
happens to be found or needed online.
Patching the system is done with FRAID and UEDDT and SYSFIX. To patch the
system DMP file (used upon reloads), R FRAID and specify WAITS.DMP[S,SYS].
(FRAID allows you either to just look at the file using its symbols or to
alter it.) To patch the running system, R UEDDT, then type CALL and
REENTER. (You must have the ACW privilege.) To copy a patch from the
DMP file to the running system, use SYSFIX. (Again, you must have ACW.)
R FRAID gives you a version of RAID that allows you to examine and/or
change a file instead of a job's core image; the symbols contained in a
DMP file being changed are utilized.
R UEDDT is gives you User Exec DDT, and allows you to examine and/or
change the running system, using DDT and the exec's symbols. You must
REENTER UEDDT before you can do deposits into the system. Special UEDDT
commands are documented on the last page of the DDT source, DDT[S,SYS].
The recommended way to install a permanent patch is first to use FRAID on
the DMP file, and then to use SYSFIX (see later page in this file) to copy
the patch into the running system. You must, of course, be careful to
copy the pieces of it to the running system in an order that won't crash
the running system. (SYSFIX allows copying the PATCH area first, in order
to make patching in the right order easiest. Unfortunately, SYSFIX doesn't
provide the convenience of symbols, other than PATCH.)
The program SYSFIX[SS,SYS] compares the protected part of the system
against the corresponding part of WAITS.DMP[S,SYS], to look for system
clobberage (e.g., by the III processor or parity errors) and/or patched
changes in the DMP file. Thus found, such clobberage can be fixed and
patches can be copied to the running system, with SYSFIX, using the
REENTER monitor command.
RUN SYSFIX[SS,SYS] and the program will type out all words that do not
match between system core and the DMP file. Note that only the areas from
PATCH through PATCH+77 and CHKBEG to CHKEND are checked.
If you REENTER the program, then you will be allowed to selectively "fix"
any word that differs by taking the value from disk and placing it in
core. Note that corrections are made in precisely the order that commands
to fix them are given and that PATCH is checked first (in case you want to
copy a patch from the DMP file into core).
The WHO program is capable of displaying one or more blocks of data from
tables within the system. WHO has two display modes: WHO mode (displays
jobs and files) and TABLE mode (displays system tables). The commands for
moving the displayed text around on the screen are the same in both modes.
(Thus, you can request display of more tables than fit on the screen and
then move the window around to see different pieces of it, just like with
the normal WHO display.)
Below are the commands that make WHO display selected system table(s).
Note: For both typed commands and commands given inside an indirect file,
any command that does not include a symbol name applies to the table whose
symbol name was last given.
αβ! Go into TABLE display mode (display system tables)
! Return to WHO mode (display jobs and files)
*<symbol> display the one table starting at location named by <symbol>
+<symbol> add <symbol> to list of tables being displayed
-<block> use symbol value inside this block, in case last
symbol entered is multiply defined
+<octal nbr> display this many cells of the table (default is 1 word)
+<octal nbr>/ assume last symbol typed in has this octal value
This lets you define your own symbolic tables as you go,
by saying +<symbol><crlf>+<symbol value>/<crlf>.
"<octal nbr> start displaying the table with this octal offset from
the beginning of the table (forced to multiple of 8)
/ select the next symbol lower on screen as "current"
F display the current table in full-word format
L display the current table in left-halfword format
R display the current table in right-halfword format
You can type just "!" (not αβ!) in the monitor command that starts WHO, in
order to get into TABLE mode immediately. (Otherwise, WHO normally starts
in normal job-display mode.) For example, W!@MACS<cr>.
All of the system's main data areas are available for display by WHO; this
includes both the cached and uncached data areas in lowcore plus all of
free storage. However, the write-protected parts of the system (code) are
not accessible from WHO (since they're not mapped into WHO's upper
segment). Undefined symbols and symbols with address values outside the
available range will be displayed followed by question marks.
In either of the half-word formats, eight data words are displayed per
line; in full-word format, four data words are displayed per line. Data
words that contain zero are not displayed, as an aid to picking out the
non-zero data, and if all the words to be displayed on a given line are
zero, then the whole line is omitted.
If only one word of a given table is to be displayed (this is the
default), then that word is always displayed on the same line as the name
of the word and in full-word format.
In an indirect file (e.g., file used with @ command, with default
extension .DIS), to select which tables should be displayed, use the
following commands (you must already be in TABLE mode, with αβ! or W !
typed manually). The default PPN in TABLE mode is [SS,SYS] (in normal
mode the default PPN is [P,DOC]). Thus, useful indirect files for setting
up the display of interesting system tables can be kept on [SS,SYS].
<symbol> add <symbol> to list displayed (note no + or *)
-<block> last symbol entered is from this block
+<octal count> display this many cells of the table
+<octal value>/ assume last symbol typed in has this octal value
This lets you define your own symbolic tables as you go,
by saying <symbol><crlf>+<symbol value>/<crlf>.
"<octal offset> display table with this octal offset from table beginning
= display the table in full-word format
< display the table in left-halfword format
> display the table in right-halfword format
When WHO is initially put into TABLE display mode (and whenever the
N[ormalize] command is given while in TABLE mode), there is a default
group of tables displayed. This default group may vary from time to time
as system requirements change. However, using indirect files is a good
way to repeatedly view selected tables.
-- ME 20 Oct 80
The PK program is capable of displaying the text in the system disaster
buffer (whose text gets typed out on the CTY). There are two ways of
having this text presented: type either αβ! or !<cr> when asked by PK for
a TTY number.
If you type αβ!, then the disaster text will be displayed -- if you have a
small display (e.g., DM) then the most recent text may not fit on the
display.
If you type !, then the disaster text will be typed out on the page
printer (piece of paper number 1, not 0), so that the most recent text
will be guaranteed to stay on the screen. However, on DDs and IIIs, then
method is much slower.
So, αβ! is recommended for DDs and IIIs, and ! is recommended for DMs.
Also, the αβ+ and αβ- commands can be used (in αβ! or ! modes) to add or
remove the blank lines that normally appear on the CTY, thus compressing
or uncompressing the display presentation.
Flags set automatically upon Reload: PDP-10 SW register.
When WAITS is started upon reload, it looks at the PDP-10 switch register
(36 bits) to determine if certain flags should be set automatically. If
the 37,,0 bits in the SW register are all on, then the right-half SW bits
indicate system flags to be set and the remaining left-half bits indicate
the time that the system should announce as when it will come up (if down).
To set the 37,,0 bits in the SW register, use the KLDCP command
SW 3 600000
which sets the SW register to 3,,600000. The remaining left-half bits
(777774,,0 bits) are read continuously from the PDP-11 switches labelled
15 to 0 on the front of the "KL-10" console (the two high-order switches,
labelled 17 and 16, are not used).
To make the system set flags from SW upon reload (necessary to reload the
system in "down" mode), set the right-most 3 PDP-11 switches up (switches
for bits 2 through 0); these three PDP-11 switches represent the 34,,0
bits in the SW register.
To set the time that the system will come up, put the hour (00 to 23) in
the five PDP-11 switches labelled 13 through 9, and put the minutes in the
six 11-switches 8 to 3.
If switch 15 is up, then the coming-up time is "approximate".
If switch 14 is up, then the coming-up time is "Who knows when".
If switches 13 and 12 are both up, then the down message is taken from
the patchable location UPMSG5 (which should start with one or two
spaces, to delimit the preceding message "System is down").
If switches 8 through 3 are all up, then the down message says "Up
shortly!" instead of at any specific time.
To reload the system "down", the 0,,600000 bits should be on in the
SW register (in addition to the 37,,0 bits) to make the system set
the MAINTMODE and TTYLOK flags. Since this is the most common usage
of the automatic flag-setting feature, the SW register is normally
left set to 3,,600000 so that only the 3 low PDP-11 switches need to
be toggled to reload the system up or down.
However, sometimes it is desireable to make every reload set some other
flag(s), whether the system is being reloaded up or down. In this case
the appropriate SW register flags can be set with the SW command (in
KLDCP). Here are the meanings of the various SW bits (the table defining
these bits is at MCELTB in SYSINI, and it is looked at around SYSIN1 in
SYSINI; some special bits for DDT are looked at in SYIDDT). Of course,
the 0,,600000 bits should be off when reloading the system up and on when
reloading down, and the 37,,0 bits must be on to make the following bits
be looked at.
Right-
Half Flag
SW Bit Word Meaning
------- ------- --------
1 go to EDDT immediately, before really starting the system
2 don't keep DDT around (save core, especially for free storage)
10 CORNXM initial core NXMs are expected, proceed automatically from them
20 NOP2 don't use P2 memory (allows P2 to be debugged "offline" from WAITS)
40 SWPCH2 checksum every swap op after swap out
100 SWPCHK checksum every swap op (after swap in)
400 BLTSWP do BLT after swapin
1000 IMPPMS allow IMP to complain on CTY
2000 IMPDIE keep WAITS ArpaNet service down
4000 NOLOGIN don't allow anyone to log in
10000 EXPMOD unused
20000 DEBMOD make certain errors stop in DDT instead of just typing CTY msg
40000 IIIOFF don't run the IIIs (on F2, this is LITOFF, disables job in lights)
100000 DDOFF don't run the Data Disc
200000 MAINTM don't run phantoms, mark system as down
400000 TTYLOK disable most terminals, mark system as down
KLDCP: compiling it, putting it on tape, 11LOAD to reload the 11
;To compile KLDCP:
.AL KL,SYS
.R PALX
*KLDCP/H
[ignore about 7 byte-too-large errors]
*↑C
.RU 11LOAD
*LKLDCP
*WKLDCP ; write it out with DDT
*BKLDCP ; write out a .L11 file for reloading into 11 using αX GRONK cmd.
*
;To put KLDCP on Dectape (if the Dectapes are working):
.R FILEX
*DTA:(VL)←KLDCP.BIN/I
*
The command 11LOAD, given on the CTY, will reload KLDCP into the
console-11. This command can also be given by a user job with the DEV
privilege. If no arguments are given, then a phantom (or attached job if
system is in maintenance mode) will be started which will automatically
reload the console-11 from KLDCP.L11[KL,SYS]. If you want to reload the
11 from a different file, then give the CTY command:
.11LOAD;AGRONK
and you will be asked for the L11 filename.
11LOAD runs 11LOAD[1,2], which is a special version of 11LOAD[KL,SYS].
The [1,2] version will accept other 11LOAD commands provided the 11LOAD
command is followed immediately by a semicolon and then the command you
want. This version of 11LOAD is compiled with the assembly switch FTQUIK,
and in some cases it will exit rather than wait for a command to be typed
(hence the ";AGRONK" form of commands) since it usually runs as a phantom.
The 11LOAD command only works on the CTY and is provided so that normal
users can reload the console-11 without having any privileges, whenever
that is necessary.
The old way (still works, same effect as 11LOAD cmd above) for reloading
KLDCP from WAITS is:
.al kl,sys
.ru 11load
TYPE ? FOR HELP
*aN EXTENDED COMMAND gronk
Reload the 11 via DTELOD UUO. L11 File - kldcp
11-Image is loaded into 10-memory.
Next, we gronk the 11 via DTELOD
According to the system, we're done now.
*
The correct way to start DDT on the console 11 is the following:
Set HALT/ENB to halt.
Set switches to 777707 and press EXAMINE.
Record the data. It is the PC at which the program
should be restarted via nnn$G in DDT.
Set switches to 56000 (the starting address of DDT)
Lift DEPOSIT, storing 56000 into the PC
Set HALT/ENB to ENB
press Continue
This is for debugging, e.g., a KLDCP that has crashed on the 11.
How to put up new DSKDMP[SS,SYS] and new BOOT on console-11 dectape:
The /D is a switch you need to give to CNVRT[KLM,SYS] to cause it to
make the right format BOOT.A10 (which you then rename to BOOT.D10).
The L switch types out the directory after it's done.
You could read the old BOOT off the tape with
BOOT.OLD←DTA2:BOOT.D10(VA)
.AL SS,SYS
.LOAD DSKDMP%S%B%? ;use proper assembly switch settings for DSKDMP
.SAVE DSKDMP 8 ;save whole core image with symbols (without DDT)
.LOAD DSKDMP%S%B%? ;use proper assembly switch settings for BOOT
.SAVE BOOT ;save core image
.SAVE TBOOT 4 ;save whole core image with symbols (without DDT) too
.RU CNVRT[KLM,SYS]
*BOOT/D
↑C
.RU CNVRT[KLM,SYS]
*TBOOT/D
↑C
.REN BOOT.D10←BOOT.A10
;Now mount console-11 dectape on DTA2
.R FILEX
*DTA2:(VAL)←BOOT.D10
↑C
.
There are currently (20 Oct 80) several versions of BOOT on the
console Dectape. They are named by adding either an N or an O
and/or a 1 to "BOOT", to get BOOT,NBOOT,OBOOT,BOOT1,NBOOT1,OBOOT1.
The BOOTy versions will boot DSKDMP.DMP[SS,SYS] (currently assumes two channels).
The NBOOTy versions will boot DSKDMP.NEW[SS,SYS] (assumes only one channel).
The OBOOTy versions will boot DSKDMP.OLD[SS,SYS] (doesn't exist).
The xBOOT versions assume that there are two disk channels (and two controllers).
The xBOOT1 versions assume that there is one disk channel (and one controller).
DSKDMP.FAI[SS,SYS] is the source file for both DSKDMP and BOOT. See the
various assembly switches on page two.
The xBOOTy files are loaded into the 11 with the KLDCP command LD,
e.g., LD NBOOT1<cr>. The DSKDMP program can then be loaded into the 10
with the KLDCP command DS.
We are currently (20 Oct 80) using NBOOT1 (LD NBOOT1) since only one
controller (controller number 1) is working.
Note that both BOOT and DSKDMP have compiled in tables telling which
C1 channel (controller) each disk drive is on (under IFE RHDSK).
HOWTO unwedge the MagTape service:
The normal state of cells in MTCSER are:
DCREQ/ -1 the count of people waiting for the DC
MTREQ/ -1 the count of people waiting for the MTC
MTCUSR/ 0 job number of current MTC user
MTAVAL/ 0 non-zero means wake up someone waiting for MTC
Make sure that nobody has the MTAs assigned nor INITed, and use UEDDT to
set the system cells to the above values.
The program DEV[SS,SYS] types out the names of all devices in the DDB chain,
in case you think the chain is broken, or you want to find out the order
of some devices or the number of certain kinds of DDB.
The program CORMON[SS,SYS] types out current core usage parameters.
The program UP2DDT[SS,SYS] runs P2 EDDT from a P1 terminal, with various
options, for debugging the P2 system. You must have the DEV privilege.
The program P2LOAD[1,2] can mark P2 memory up or down, reload P2 from any
DMP file, and/or tell what users are using P2 devices. The command
P2LOAD, given from the CTY, runs P2LOAD with the defaults of reloading P2
from P2SYS.DMP[S,SYS]. When run with the command RUN P2LOAD[1,2], the
default is NOT to reload P2, just to report any users of P2 devices (which
is always done). Defaults can be overridden by ending the monitor command
line with a semicolon and then any desired filename (for reloading from)
and/or switches. The available switches are:
/L reload P2 (needed unless P2LOAD command is given on CTY)
/N don't reload
/D mark P2 memory down (P1 system will use some other memory for
P2 variables, and P2 won't be able to see that memory P1 is using)
/U mark P2 memory up
/Q don't ask about P2 memory really being OK (when marking it up)
/? report status of P2 memory
/B BLT zeroes throughout P2 memory.
/B=oooo BLT the octal 36-bit constant oooo throughout P2 memory.
Here are some example commands:
including this new feature.
P2LOAD ;load the normal system, P2SYS, into P2 memory
P2LOAD;? ;report state of P2 memory (down means the KL doesn't touch it)
P2LOAD;P2DIAG/Q ;load the diagnostic program P2DIAG into P2 memory
P2LOAD;/B ;fill P2 memory with zeroes
P2LOAD;/B=1 ;fill P2 memory with words containing 1's
P2LOAD;/B=777777777777 ;fill P2 memory with words containing -1's
("P2LOAD" can be abbreviated to "P2".) Loading anything but P2SYS into P2
memory makes P2LOAD mark P2 memory as down, so that the KL won't get
confused.
A Tale of Two CTYs
This writeup explains how to set up a WAITS CTY on the normal TTY scanner,
either in place of, or in addition to, the normal CTY. This feature
currently is not available on the CCRMA F2 version of WAITS, although it
could be made to work there with only a small amount of effort.
Assume that you want to make DCA port 55 be the CTY, and assume the
TTY number of the CTY is 160. If not, alter these instructions
correspondingly.
(0) Make sure that the system doesn't think TTY55 is a display. If it
does, do a TTY TTY55 NO DM command (or just TTY NO DM on TTY55). Make
sure that worked. (You should clear DMLIN in LH of LINBIT+55.)
(1) Hopefully the DCA port will already have been set up by the system
to run at the right speed; if not, TTY TTY55 SPEED nnnn, where nnnn is
the correct baud rate, will set the speed. This may not work if you
have already done step (2) below. If you have done that step and the
above command doesn't work, then try TTY TTY160 SPEED nnnn.
(2) Set the low 7 bits of DCATAB+55 to 160 instead of 55.
(3) Set the right half of LINBIT+160 to 400055 instead of 0
(4) Set the cell SCNCTY to 55 instead of -1.
(5) If you want duplicate output on both the real CTY and the scanner CTY,
then zero the cell SUPCTY (else leave it -1). Duplicate output may not
work for normal system output to the CTY as a plain terminal, but it
should work for EDDT output being duplicated on both kinds of CTY.
(6) If you want EDDT to accept parallel input from both the real CTY and
the scanner CTY, then zero the cell SUICTY (else leave it -1). This only
affects EDDT input. Normal monitor/user input from the real CTY cannot be
suppressed with SUICTY.
Now normal TTY and EDDT type-in and type-out should work from the
scanner CTY of your choice. To undo this, restore the normal values
(from the "instead of" phrases above) changed in steps (2) through (4)
above. The values of SUICTY and SUPCTY are irrelevant if SCNCTY is -1.
(7) The command CTY 55<cr> given on (any) CTY will execute step (4).
The command CTY$ given on (any) CTY, where $ is altmode, will undo
step (4). (It is possible that this might screw up leaving TOIP set
in the CTY's TTY DDB, which means you won't get any output!) These
commands can alternatively be given from a privileged (DEV priv) job.
WSYNC
ME - If the device is not a TTY, then WSYNC will return either
(1) immediately, if IOACT (or DEVSBB if DEVIBF bit on for internal system
buffer) is already off or
(2) when IOW (which WSYNC sets) is cleared, and not before.
So you set IOACT and call WSYNC. Interrupt level calls SETIOD or STTIOD
and the job will be awakened.
How To Find a bad RAM chip in the Ampex DC830 disk controller.
(This is an excerpt from WIZARD.TXT[SS,SYS].)
%0 - Before you start
Before the controller will let you alter the contents of any registers,
you must put the MODE switch in the position labelled "CE NORMAL". Be
sure you put it bad in the (non-CE) NORMAL position when you are done.
%1 - Determine that the controller has a bad memory chip
If the controller halts with a "Check 1" error, there is a good chance
that a memory chip has gone bad. If the "Check 1" light is on, switch the
"Register Display" rotary switch to the "Check 1 register" position. If
the 0 and 10 bits, or the 0 and 9 bits are on, then one of the RAM chips
has probably died.
%2 - General discussion: How to examine a word in RAM
A microinstruction in the controller is a 32 bit word. To look at a
word in memory, you must load the IAR with the desired address and look at
the four bytes of that word sequentially.
To load the IAR, set the desired address in the address switches, set the
Register Display switch to "IAR" and push the Execute switch. Notice that
the data in the address lights now correspond with the data in the
switches.
The data lights are now displaying the data of byte 0 (the high order
byte). To see the contents of all four bytes, sequence the low order
address switches in the cannonical "00 01 10 11" cycle. The data in the
lights will change as you sequence the switches, and you don't have to
push Execute or anything.
%3 - Examine the IAR
When the controller stops with a check 1 error, the IAR contains the
address of the instruction that was being fetched when the error was
detected. Put the Register Display switch in the position labelled "IAR"
and look at all four bytes at that address. (See step %2 on how to
examine this location.) Write down the contents of the IAR and the four
bytes of data found there. Note the parity bit in the data word, and
be sure to note if there is bad parity in any of the four bytes.
%4 - Examine the BAR
The BAR is the Backup Instruction Register. It contains the address of
the previously executed instruction. Switch the Register Display selector
to "BAR" and put the address you find there in the address switches. Move
the Register Display selector back to the "IAR" position, and do step %2.
Again, write down the address and the four bytes of data, noting bad
parity in any of the four bytes.
%5 - Examine the DAR
The DAR is the Data Instruction Register. It contains the address of the
most recent data fetch or store. Switch the Register Display selector to
"DAR" and put the address you find there in the address switches. Move
the Register Display selector back to the "IAR" position, and do step %2.
Again, write down the address and the four bytes of data, noting bad
parity in any of the four bytes.
%6 - If you failed to find any bad parity
Failing to find bad parity is usually due to a soft error in the RAM,
which will often dissappear when you try to examine it. The best you can
do is log the contents of the IAR, BAR, and DAR, and thus hope to get some
statistics on which addresses are losing. Return the mode switch to the
(non-CE) "NORMAL" position and continue the system.
%7 - If you found bad parity
Here is where the fun begins. You must find the microcode listing for the
controller you are working on (there are two, and they are different).
Get the huge orange notebook labelled "DC830 Microword listing" which
corresponds to the sick controller.
Look up the address that has bad parity associated with it. Note the
discrepancy between the data that you found and the data located at that
address. With luck, you will find a single bit difference, in which case
you may proceed to section %8, "Finding the bad chip". If none of the
bits are different, the RAM that stores the parity bit may be bad, in
which case you should read section %9, "Finding the bad Parity chip". If
the data are completely different, something else may be wrong, or perhaps
you did something wrong.
%8 - Finding the bad chip
At this point, you know the losing address, the byte within that address (0:3),
and the data bit within the byte (0:7). Finding the bad chip is now a simple
three step process, which locates the board, row, and column of the bad chip.
%8.1 - Which board?
Given the losing byte and which data bit, use the table to find out
which PC card has the losing RAM chip:
--------------------
byte bit card
--------------------
0 0:3 C04
0 4:7 C05
1 0:3 C07
1 4:7 C08
2 0:3 C09
2 4:7 C10
3 0:3 C11
3 4:7 C12
%8.2 - Which row?
This one is tougher. Write out the losing address in binary,
and use bits 2:5 to index the table below to find the row number
of the losing RAM ("X" means "don't care".)
------------------------------
losing address
0000 0000 0011 1111
0123 4567 8901 2345 row
------------------------------
XX00 00XX XXXX XXXX 15
XX00 01XX XXXX XXXX 14
XX00 10XX XXXX XXXX 13
XX00 11XX XXXX XXXX 12
XX01 00XX XXXX XXXX 16
XX01 01XX XXXX XXXX 11
XX01 10XX XXXX XXXX 10
XX01 11XX XXXX XXXX 09
XX10 00XX XXXX XXXX 08
XX10 01XX XXXX XXXX 07
XX10 10XX XXXX XXXX 06
XX10 11XX XXXX XXXX 05
XX11 00XX XXXX XXXX 02
XX11 01XX XXXX XXXX 04
XX11 10XX XXXX XXXX 03
XX11 11XX XXXX XXXX 01
%8.3 - Which column?
Given the losing data bit, use the table below to find out which column
the chip is in.
-----------
bit col
-----------
0(msb) D
1 C
2 B
3 A
4 D
5 C
6 B
7 A
You now know the card, row and column of the bad RAM. Power the controller
down, and replace the chip. When you power the controller back up, it
will start up again on its own (at least it should).
%9 - Finding the bad Parity chip
Come here if the parity bit seems to be losing for a certain address.
You know the losing address, and which byte within that address has bad
parity.
%9.1 - Which board?
Easy. Its always C02.
%9.2 - Which row?
This is the same as in %8.2, "Finding the bad chip".
%9.3 - Which column?
There is one column for each byte. Use the table below to find it.
For example, if byte 2 showed bad parity, then the corresponding
RAM parity chip is on column B.
-----------
byte col
-----------
0 D
1 C
2 B
3 A
%10 - An example.
Suppose we found the following. The BAR contains 38EC, and we notice that byte
2 has bad parity. We write it all down, and consult the mocrocode listing.
--------------------
addrs b0 b1 b2 b3
--------------↓↓---- (↓↓ to remind us that byte 2 had bad parity)
We found 38EC 1A B0 2F 06
Should be 38EC 1A B0 6F 06
So it is dropping bit 1 in byte 2 of address 0011 1000 1110 1100. Consulting
the tables, the byte and bit tells us it is board C09. The address tells us
it is row 09, and the bit tells us it is column D. So, on board C09, we
replace 09D. Make sure you plug the chip in the right way! Alternating rows
have opposite orientations.
∂07-Nov-82 1831 ME new DART up, w/RETRY <number>
I have modified DART so that it will re-dump all the files that were dumped
last on one of the unreadable tapes. Such files will be re-dumped twice,
so the next two PDUMPs can be expected to be very big.
Also, I have modified the RETRY command (used for resuming a PRESTORE in
the middle of a tape after an "illegal format" error) so that if you give
it a number, such as RETRY 20, it will retry automatically after that
many more "illegal format" errors. This should make it a little easier
to do PRESTOREs, since they'll require less intervention. The automatic
retry is done starting with an ADVANCE unless a file was being written
from tape to disk at the time of the error, in which case a BACKSPACE will
be done and then the automatic retry. In certain cases, the illegal
format error may cause Dart not to notice the very file it is looking for,
in which case it will blindly advance on to the next file on the tape,
missing the desired one. If you notice this, you can type ESC I to stop
Dart at the next error (clearing the automatic retry count), so that you
can do some manual BACKSPACing and then a manual RETRY.
Finally, it is now possible to stop a TLIST quickly and cleanly (at the
end of a magtape file) by typing ESC I. Dart will respond with "TLIST
stopped" when it reaches the next end-of-file mark. Note that the TLIST
command doesn't clear the ESC I flag, so if you type ESC I before the
TLIST, it will stop after one magtape file (possibly several disk
filenames), at which point the flag has been cleared, so you can give
another TLIST command that won't stop immediately.
SYSSAV: To save a copy of the running system in a file, do the following:
.AL T,SYS ;this is where we keep these saved copies
.RU SYSSAV[SS,SYS] ;writes WAITS.XPN on the alias area
.R FILEX
*WAITS.nnn/D←WAITS.XPN ;writes a .DMP file (because of the /D) as WAITS.nnn,
;where nnn should be the next unused decimal number for
;saved WAITS versions on [T,SYS]. To figure out this
;number (since we don't keep all these files on the
;disk), do a LOCATE WAITS.*[T,SYS] and add one to
;the highest numbered file listed.
;Now CALL out and delete the .XPN file.
.DEL WAITS.XPN ;delete this unneeded form of the saved file
You can now FRAID WAITS.nnn to poke at the saved system with its symbols.
Note that SYSSAV copies the system's symbols into the WAITS.XPN file from
the current WAITS.DMP[S,SYS]. Core above 700000 isn't saved, as that is
where the symbols go in the saved file.
∂09-Sep-83 1027 JJW@S1-A Remind queue
Received: from S1-A by SU-AI with TCP/SMTP; 9 Sep 83 10:27:00 PDT
Date: 09 Sep 83 0919 PDT
From: Joe Weening <JJW@S1-A>
Subject: Remind queue
To: ME@S1-A
There is a message in S1-A's remind queue from OTA to a destination at CMUA,
which has the MILnet address 26.1.0.14, but didn't get delivered during the
MILnet split yesterday. So now it needs to be changed to 10.1.0.14 in the
queue. Could you do this, or tell me how to?
ME - OK, here's what you do. (I'm telling you this as you'll have to do
it again next week, both at S1-A and Sail, since I'll be gone, and you'll
have to do something like this on 4 Oct when the real switch happens.)
This same procedure, more or less, can be used to fix up queued mail for
a host that has just changed its host number (such as SRI-AI just did).
(1) First, FTP over CANCEL.DMP[MAI,SYS], a slightly hacked version of
CANCEL (it has RAID in it so that I could set the flag HNUMB and clear the
cell DEBNAM -- you could just compile CANCEL there and diddle those two
cells); setting HNUMB nonzero makes CANCEL list only messages queued for
the network, and clearing DEBNAM lets you run the program despite having
done step two below.
(2) So that you can diddle the queue without interference, create the file
DEBUG[RMD,SYS] (just an empty file is fine, or put a message in it saying
what you are doing to the queue); this file locks out the CANCEL and RETRY
user commands.
(3) Delete ↓<rmnd>↓.dmp[rmd,sys] to lock out the phantom.
(4) Check to see if the phantom is currently running and wait for it to
go away.
(5) RUN CANCEL[MAI,SYS] (the special version). If it lists anything of
your own, just CALL out; otherwise it will EXIT after listing nothing.
Now type REENTER. CANCEL will say "PPN="; type <RETURN>. CANCEL will
list the first queued net message, but in addition to the host name,
it will show the host number in both octal halfword format and standard
decimal dotted form. Now type "999<return>" to get CANCEL to show the
short form of all the remaining queued net messages. This lets you see
about how many queue entries you have to fix and what they look like.
The ones you have to fix are those that don't show a host name at all
(CANCEL uses the current host table) but instead list the host number
in the host name field. These will, next Friday, all be 26...-type
numbers, and on 4 Oct they'll be 10... numbers.
(6) FRAID REMQUE.QUE[RMD,SYS]/N/W -- this is the file containing the
data for all queue entries, 10 (octal) words per entry (the first 10
words of the file hold special general info). Now say, to FRAID:
-1,,αβM ;sets search mask to only care about the left halfword
3200,,αW ;searches for host numbers that specify net 26
Now change each such host number (in the left half) from 3200,,x to
1200,,x (on 4 Oct, you'll have to see which hosts changed permanently in
the other direction, changing 1200,,x to 3200,,x; you also might check for
some of these to do on 15 Sep, in case some mail is queued just before the
next test starts, but I wouldn't worry about doing that, since it's fairly
unlikely and only means a one-day delay for such mail). Note that if αW
leaves a "*" on the top (the prompt) instead of "OK", then it hasn't yet
displayed all the matches of the search -- after you fix those on the
screen, give the "∨" command to continue the search (one or more times,
fixing entries found each time before continuing).
When changing the net numbers this way with FRAID, make sure that you're
not changing some other remind queue data that only looks like a net number.
I haven't actually seen any such data yet, but it is possible. All the
host numbers will be in words whose address ends in 2 (and if it looks like
a host number in such a word, you can assume it is -- that word will be
sixbit if not a host number).
Now say αβE to get out of FRAID.
(7) RUN CANCEL[MAI,SYS] like before to make sure you corrected everything;
there should be no unknown host numbers showing up this time. If you made
a mistake in step (6), just reiterate (6) and fix it.
(8) When everything looks good from CANCEL, delete DEBUG[RMD,SYS] and
then DO RF (putting up a new remind phantom DMP file from SYS:MAIL,
using whatever method you use when you've changed \FORWARD).
(9) Just for fun, R MAIL<cr>WAKE<cr> to wake up the new phantom, and
see what it does using WHO.